智能论文笔记

A Relative Church-Turing-Deutsch Thesis from Special Relativity and Undecidability

Blake Wilson , Ethan Dickey , Vaishnavi Iyer , Sabre Kais

分类：人工智能

2022-06-13

从图灵（Turing）在1950年的开创性工作开始，人工智能提出，图灵机可以模拟意识。这意味着宇宙是计算机上的模拟的所有事物的潜在理论，该理论引出了一个问题，即我们是否可以证明自己存在于模拟中。在这项工作中，我们构建了一个相对模型的计算模型，其中可计算\ textIt {local}计算机由经典的图灵计算机模拟。我们表明，其全局模拟器的本地计算机计算\ textbf {仿真属性}的问题与停止问题相同。然后，我们表明，计算全局模拟器积累的时间，空间或误差是模拟属性，因此是不可决定的。这些仿真属性在相对模型中产生了特殊的相对论效应，我们用来构建相对教会的 - 杜特施奇论文，其中全球经典的图灵机器为本地机器计算具有与恒定时间的局部计算复杂性的量子力学，在我们的宇宙中经验丰富。

translated by 谷歌翻译

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Srinivasan Iyer , Xi Victoria Lin , Ramakanth Pasunuru , Todor Mihaylov , Daniel Simig , Ping Yu , Kurt Shuster , Tianlu Wang , Qing Liu , Punit Singh Koura

分类：自然语言处理

2022-12-22

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.

translated by 谷歌翻译

Decoding surface codes with deep reinforcement learning and probabilistic policy reuse

Elisha Siddiqui Matekole , Esther Ye , Ramya Iyer , Samuel Yen-Chi Chen

分类：人工智能 | 机器学习 | 神经与进化计算

2022-12-22

Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.

translated by 谷歌翻译

Cattle Detection Occlusion Problem

Aparna Mendu , Bhavya Sehgal , Vaishnavi Mendu

分类：计算机视觉

2022-12-21

The management of cattle over a huge area is still a challenging problem in the farming sector. With evolution in technology, Unmanned aerial vehicles (UAVs) with consumer level digital cameras are becoming a popular alternative to manual animal censuses for livestock estimation since they are less risky and expensive.This paper evaluated and compared the cutting-edge object detection algorithms, YOLOv7,RetinaNet with ResNet50 backbone, RetinaNet with EfficientNet and mask RCNN. It aims to improve the occlusion problem that is to detect hidden cattle from a huge dataset captured by drones using deep learning algorithms for accurate cattle detection. Experimental results showed YOLOv7 was superior with precision of 0.612 when compared to the other two algorithms. The proposed method proved superior to the usual competing algorithms for cow face detection, especially in very difficult cases.

translated by 谷歌翻译

Maximal Initial Learning Rates in Deep ReLU Networks

Gaurav Iyer , Boris Hanin , David Rolnick

分类： (统计)机器学习 | 机器学习

2022-12-14

Training a neural network requires choosing a suitable learning rate, involving a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ demonstrates different behavior to the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of $(\text{depth} \times \text{width})$, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer of the network is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating that they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of $(\text{depth} \times \text{width})$ that align with our empirical results.

translated by 谷歌翻译

Demystifying Prompts in Language Models via Perplexity Estimation

Hila Gonen , Srini Iyer , Terra Blevins , Noah A. Smith , Luke Zettlemoyer

分类：自然语言处理

2022-12-08

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best prompts. In this work, we analyze the factors that contribute to this variance and establish a new empirical hypothesis: the performance of a prompt is coupled with the extent to which the model is familiar with the language it contains. Over a wide range of tasks, we show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task. As a result, we devise a method for creating prompts: (1) automatically extend a small seed set of manually written prompts by paraphrasing using GPT3 and backtranslation and (2) choose the lowest perplexity prompts to get significant gains in performance.

translated by 谷歌翻译

GAMMA: Generative Augmentation for Attentive Marine Debris Detection

Vaishnavi Khindkar , Janhavi Khindkar

分类：计算机视觉

2022-12-07

We propose an efficient and generative augmentation approach to solve the inadequacy concern of underwater debris data for visual detection. We use cycleGAN as a data augmentation technique to convert openly available, abundant data of terrestrial plastic to underwater-style images. Prior works just focus on augmenting or enhancing existing data, which moreover adds bias to the dataset. Compared to our technique, which devises variation, transforming additional in-air plastic data to the marine background. We also propose a novel architecture for underwater debris detection using an attention mechanism. Our method helps to focus only on relevant instances of the image, thereby enhancing the detector performance, which is highly obliged while detecting the marine debris using Autonomous Underwater Vehicle (AUV). We perform extensive experiments for marine debris detection using our approach. Quantitative and qualitative results demonstrate the potential of our framework that significantly outperforms the state-of-the-art methods.

translated by 谷歌翻译

Fully Bayesian inference for latent variable Gaussian process models

Suraj Yerramilli , Akshay Iyer , Wei Chen , Daniel W. Apley

分类： (统计)机器学习 | 机器学习

2022-11-04

Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first mapping each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.

translated by 谷歌翻译

Low-Stabilizer-Complexity Quantum States Are Not Pseudorandom

Sabee Grewal , Vishnu Iyer , William Kretschmer , Daniel Liang

分类：机器学习

2022-09-29

我们表明，具有“低稳定器复杂性”的量子状态可以有效地与HAAR随机区分开。具体而言，给定$ n $ qubit的纯状态$ | \ psi \ rangle $，我们给出了一种有效的算法，以区分$ | \ psi \ rangle $是（i）haar-random或（ii）具有稳定器保真度的状态至少$ \ frac {1} {k} $（即，具有一些稳定器状态的保真度至少$ \ frac {1} {k} $），保证就是其中之一。使用Black-box访问$ | \ psi \ rangle $，我们的算法使用$ o \！\ left（k^{12} \ log（1/\ delta）\ right）$ copies $ | \ psi \ rangle $和$ o \！\ left（n k^{12} \ log（1/\ delta）\ right）$ $时间以概率至少$ 1- \ delta $成功，并且随着访问状态准备统一，以$ | | \ psi \ rangle $（及其倒数），$ o \！\ left（k^{3} \ log（1/\ delta）\ right）$ queries和$ o \！\！ log（1/\ delta）\ right）$时间就足够了。作为推论，我们证明$ \ omega（\ log（n））$ $ t $ - 盖特对于任何Clifford+$ t $ circile都是必不可少的，以准备计算上的pseudorandom Quantum Quantum state，这是一种首要的下限。

translated by 谷歌翻译

A Simple Strategy to Provable Invariance via Orbit Mapping

Kanchana Vaishnavi Gandikota , Jonas Geiping , Zorah Lähner , Adam Czapliński , Michael Moeller

分类：计算机视觉

2022-09-24

许多应用程序需要神经网络的鲁棒性或理想的不变性，以使输入数据的某些转换。最常见的是，通过使用对抗性培训或定义包括设计所需不变性的网络体系结构来解决此要求。在这项工作中，我们提出了一种方法，使网络体系结构通过基于固定标准从（可能连续的）轨道中选择一个元素，从而使网络体系结构相对于小组操作证明是不变的。简而言之，我们打算在将数据馈送到实际网络之前“撤消”任何可能的转换。此外，我们凭经验分析了通过训练或体系结构结合不变性的不同方法的特性，并在鲁棒性和计算效率方面证明了我们方法的优势。特别是，我们研究了图像旋转（可以持续到离散化工件）以及3D点云分类的可证明的方向和缩放不变性方面的鲁棒性。

translated by 谷歌翻译